Skip to content

fix(ai-red-teaming): repair SDK config regression + restore local analytics#34

Merged
rdheekonda merged 4 commits into
mainfrom
fix/airt-sdk-config-and-local-analytics
Jun 4, 2026
Merged

fix(ai-red-teaming): repair SDK config regression + restore local analytics#34
rdheekonda merged 4 commits into
mainfrom
fix/airt-sdk-config-and-local-analytics

Conversation

@rdheekonda
Copy link
Copy Markdown
Contributor

@rdheekonda rdheekonda commented Jun 3, 2026

Summary

Fixes a regression that broke AI red-teaming attack workflows, restores consumable analytics, and hardens every tool so users never see raw tracebacks.

Part 1 — SDK config regression + local analytics

Root causes

  1. dn.server AttributeError (codegen regression). The generated SDK-config block referenced a non-existent dn.server module attribute in its fallback, raising AttributeError surfaced as a misleading FATAL: Could not configure SDK.
  2. Wrong env-var contract. Config was gated only on DREADNODE_SERVER/DREADNODE_API_KEY; runtimes injecting DREADNODE_LLM_* (or relying on the saved profile) fell into the broken path even though dn.configure() resolves credentials itself.
  3. No local analytics. Scripts printed "completed successfully" but persisted nothing, so results tools reported false failures.

Changes

  • _build_configure(): defer to dn.configure() (explicit > env > saved profile); read .server off the returned instance.
  • _resolve_platform_env(): also accept DREADNODE_LLM_BASE/DREADNODE_LLM_API_KEY.
  • New _build_analytics_writer(): run the SDK's deterministic analyze() over assessment.attack_results and persist a real *_analytics.json (no fabricated metrics) into the workspace dir the tools scan. Wired into all 7 templates.
  • results.py: envelope-aware parsing (ASR from execution_stats.overall_asr, trials from total_trials); validate_attack_results/get_analytics_summary are platform-aware (no hard failure for platform-only runs).

Part 2 — Never surface raw tool errors to users

  • New tools/errors.py safe_tool wrapper: catches any unexpected exception in a tool and returns a clean, user-facing message; raw detail goes to stderr only. Preserves name/docstring/signature/annotations so tool schemas are unchanged. Loaded by file path (capability tool files are flat modules with no parent package).
  • Applied @safe_tool to all 21 tool entrypoints (assessment, attacks, goals, results, session, skills_manager, workflows).
  • Hardened previously-unguarded helpers to degrade gracefully:
    • assessment._load(): missing/corrupt JSON → {}
    • goals._load_goals(): missing/unreadable CSV → []

Verification

TAP on groq/meta-llama/llama-4-scout-17b-16e-instruct (attacker = judge = target, 10 iters):

  • SDK configured: server=… (no crash); standalone re-run exits 0.
  • [analytics] wrote local analytics: …/<id>_analytics.json; validate_attack_results → ✅; summary shows ASR 100%, Risk 8.0/10, 1 high-severity finding, 1 trial.

Tool hardening:

  • All 7 tool modules load under the real flat-module loader and expose all 21 tools.
  • Corrupt assessment file → "No assessment registered"; missing goals CSV → "Goals dataset not found"; forced PermissionError → clean safe_tool message, traceback to stderr only. No tracebacks reach the user.

All modified files py_compile cleanly. No behavior change for environments already setting DREADNODE_SERVER/DREADNODE_API_KEY; tool schemas unchanged.

…lytics

Generated attack workflows failed to configure the Dreadnode SDK and
produced no consumable results.

Root causes:
- The codegen SDK-config block referenced a non-existent `dn.server`
  module attribute in its fallback path, raising AttributeError that
  surfaced as a misleading 'FATAL: Could not configure SDK'.
- It gated configuration on DREADNODE_SERVER/DREADNODE_API_KEY only,
  skipping its own working branch even when a valid saved profile or
  DREADNODE_LLM_* runtime env was present.
- Generated scripts never wrote local analytics, so inspect_results /
  validate_attack_results / get_analytics_summary reported false failures.

Fixes:
- Defer credential resolution to dn.configure() (explicit > env > profile)
  and read .server off the returned instance, not the module.
- _resolve_platform_env(): also recognize DREADNODE_LLM_BASE/_API_KEY.
- Generated workflows now run the SDK's deterministic analyze() over
  assessment.attack_results and persist a real *_analytics.json
  (no fabricated metrics) into the workspace dir the tools scan.
- results.py: parse the new analytics envelope (ASR from
  execution_stats.overall_asr, trials from total_trials) and make
  validate_attack_results / get_analytics_summary platform-aware so
  platform-only runs are not reported as hard failures.
Add a shared safe_tool wrapper and apply it to all 21 tool entrypoints
so any unexpected exception is caught and returned as a clean, user-facing
message instead of a raw traceback. Diagnostics go to stderr only.

- tools/errors.py: new safe_tool decorator. Wraps sync/async tool fns,
  preserves name/docstring/signature/annotations (via functools.wraps) so
  the generated tool schema is unchanged, then applies @tool internally.
  Loaded by file path because capability tool files are imported as flat
  modules with no parent package (relative imports are unavailable).
- Replace @tool -> @safe_tool across assessment, attacks, goals, results,
  session, skills_manager, workflows.
- Harden previously-unguarded helpers so common recoverable cases degrade
  gracefully instead of raising:
    * assessment._load(): tolerate missing/corrupt JSON -> {}.
    * goals._load_goals(): tolerate missing/unreadable CSV -> [].

Verified: all 7 tool modules load under the real flat-module loader and
expose all 21 tools; corrupt-file, missing-dataset and forced-exception
paths all return clean strings with no traceback.
Patch release covering the SDK-config regression fix, restored local
analytics, and the safe_tool error-hardening in this PR.
…from display; add user-POV run sequence

Metric clarity:
- Present ASR (attack success rate) as the headline success-probability
  metric (0-100% / 0-1) in get_assessment_status and get_analytics_summary.
- Stop surfacing the severity-weighted 0-10 risk score to users. It is
  computed in the SDK and kept in the raw data / accepted by
  update_assessment_status for platform parity, but no longer displayed.
  (True P(success) is ASR; the /10 score is a separate severity measure,
  so showing both was confusing.)

UX:
- Greeting now includes a small 5-step user-POV sequence
  (Plan -> Generate -> Run -> Score -> Report) plus a one-line ASR
  explanation.
- Agent instructed to print a single-line plan before launching a run.

Note: this is a presentation-layer change in the capability; the SDK's
risk_score computation is unchanged.
@rdheekonda rdheekonda merged commit 1760a77 into main Jun 4, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant